Picture for Jan Cernocky

Jan Cernocky

BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge

Add code
Dec 09, 2025
Viaarxiv icon

Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training

Add code
Jun 08, 2025
Viaarxiv icon

Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization

Add code
May 30, 2025
Viaarxiv icon

Probing Self-supervised Learning Models with Target Speech Extraction

Add code
Feb 17, 2024
Viaarxiv icon

Target Speech Extraction with Pre-trained Self-supervised Learning Models

Add code
Feb 17, 2024
Figure 1 for Target Speech Extraction with Pre-trained Self-supervised Learning Models
Figure 2 for Target Speech Extraction with Pre-trained Self-supervised Learning Models
Figure 3 for Target Speech Extraction with Pre-trained Self-supervised Learning Models
Figure 4 for Target Speech Extraction with Pre-trained Self-supervised Learning Models
Viaarxiv icon

DiaCorrect: Error Correction Back-end For Speaker Diarization

Add code
Sep 15, 2023
Figure 1 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 2 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 3 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Figure 4 for DiaCorrect: Error Correction Back-end For Speaker Diarization
Viaarxiv icon

End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

Add code
Aug 15, 2023
Figure 1 for End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations
Figure 2 for End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations
Figure 3 for End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations
Figure 4 for End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations
Viaarxiv icon

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

Add code
Oct 15, 2022
Figure 1 for Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Figure 2 for Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Figure 3 for Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Figure 4 for Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Viaarxiv icon

An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification

Add code
Oct 03, 2022
Figure 1 for An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
Figure 2 for An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
Figure 3 for An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
Figure 4 for An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
Viaarxiv icon

DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction

Add code
Dec 27, 2021
Figure 1 for DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction
Figure 2 for DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction
Figure 3 for DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction
Figure 4 for DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction
Viaarxiv icon